12 research outputs found

    Speculation and negation detection in french biomedical corpora

    Get PDF
    International audienceIn this work, we propose to address the detection of negation and speculation, and of their scope, in French biomedical documents. It has been indeed observed that they play an important role and provide crucial clues for other NLP applications. Our methods are based on CRFs and BiLSTM. We reach up to 97.21 % and 91.30 % F-measure for the detection of negation and speculation cues, respectively , using CRFs. For the computing of scope, we reach up to 90.81 % and 86.73 % F-measure on negation and speculation , respectively, using BiLSTM-CRF fed with word embeddings

    Supervised learning for the detection of negation and of its scope in French and Brazilian Portuguese biomedical corpora

    Get PDF
    International audienceAutomatic detection of negated content is often a prerequisite in information extraction systems in various domains. In the biomedical domain especially, this task is important because negation plays an important role. In this work, two main contributions are proposed. First, we work with languages which have been poorly addressed up to now: Brazilian Portuguese and French. Thus, we developed new corpora for these two languages which have been manually annotated for marking up the negation cues and their scope. Second, we propose automatic methods based on supervised machine learning approaches for the automatic detection of negation marks and of their scopes. The methods show to be robust in both languages (Brazilian Portuguese and French) and in cross-domain (general and biomedical languages) contexts. The approach is also validated on English data from the state of the art: it yields very good results and outperforms other existing approaches. Besides, the application is accessible and usable online. We assume that, through these issues (new annotated corpora, application accessible online, and cross-domain robustness), the reproducibility of the results and the robustness of the NLP applications will be augmented

    Text mining and information extraction in clinical data

    No full text
    Avec la mise en place d'entrepôts de données cliniques, de plus en plus de données de santé sont disponibles pour la recherche. Si une partie importante de ces données existe sous forme structurée, une grande partie des informations contenues dans les dossiers patients informatisés est disponible sous la forme de texte libre qui peut être exploité pour de nombreuses tâches. Dans ce manuscrit, deux tâches sont explorées~: la classification multi-étiquette de textes cliniques et la détection de la négation et de l'incertitude. La première est étudiée en coopération avec le centre hospitalier universitaire de Rennes, propriétaire des textes cliniques que nous exploitons, tandis que, pour la seconde, nous exploitons des textes biomédicaux librement accessibles que nous annotons et diffusons gratuitement. Afin de résoudre ces tâches, nous proposons différentes approches reposant principalement sur des algorithmes d'apprentissage profond, utilisés en situations d'apprentissage supervisé et non-supervisé.With the introduction of clinical data warehouses, more and more health data are available for research purposes. While a significant part of these data exist in structured form, much of the information contained in electronic health records is available in free text form that can be used for many tasks. In this manuscript, two tasks are explored: the multi-label classification of clinical texts and the detection of negation and uncertainty. The first is studied in cooperation with the Rennes University Hospital, owner of the clinical texts that we use, while, for the second, we use publicly available biomedical texts that we annotate and release free of charge. In order to solve these tasks, we propose several approaches based mainly on deep learning algorithms, used in supervised and unsupervised learning situations

    Fouille de texte et extraction d'informations dans les données cliniques

    Get PDF
    With the introduction of clinical data warehouses, more and more health data are available for research purposes. While a significant part of these data exist in structured form, much of the information contained in electronic health records is available in free text form that can be used for many tasks. In this manuscript, two tasks are explored: the multi-label classification of clinical texts and the detection of negation and uncertainty. The first is studied in cooperation with the Rennes University Hospital, owner of the clinical texts that we use, while, for the second, we use publicly available biomedical texts that we annotate and release free of charge. In order to solve these tasks, we propose several approaches based mainly on deep learning algorithms, used in supervised and unsupervised learning situations.Avec la mise en place d'entrepôts de données cliniques, de plus en plus de données de santé sont disponibles pour la recherche. Si une partie importante de ces données existe sous forme structurée, une grande partie des informations contenues dans les dossiers patients informatisés est disponible sous la forme de texte libre qui peut être exploité pour de nombreuses tâches. Dans ce manuscrit, deux tâches sont explorées~: la classification multi-étiquette de textes cliniques et la détection de la négation et de l'incertitude. La première est étudiée en coopération avec le centre hospitalier universitaire de Rennes, propriétaire des textes cliniques que nous exploitons, tandis que, pour la seconde, nous exploitons des textes biomédicaux librement accessibles que nous annotons et diffusons gratuitement. Afin de résoudre ces tâches, nous proposons différentes approches reposant principalement sur des algorithmes d'apprentissage profond, utilisés en situations d'apprentissage supervisé et non-supervisé

    Détection de l'incertitude et de la négation : un état de l'art

    Get PDF
    National audienceOne of the goals of our endeavours is to turn a corpus of medical documents into more easily readable structured data. Thus, it is necessary to add a process capable of identifying the context in which each medical concept is used. In this article, we mainly review the various systems proposed fornegation and uncertainty detection that include a machine learning method. This last decade, studies undertaken in order to detect uncertainty and negation in texts have given satisfying results, however, there is still room for improvement.L'un des objectifs de nos travaux, à terme, est de transformer un corpus de documents médicaux en données structurées pour en faciliter l'exploitation. Ainsi, il est nécessaire non seulement de détecter les concepts médicaux évoqués, mais aussi d'intégrer un processus capable d'identifier le contexte dans lequel est évoqué chaque concept médical. Dans cet article, nous revenons principalement sur les systèmes par apprentissage supervisé qui ont été proposé pour la détection de l'incertitude et de la négation. Ces dix dernières années, les travaux pour détecter l'incertitude et la négation dans les textes en anglais ont donné des résultats satisfaisants. Cependant, il existe encore une marge de progression non-négligeable

    CAS: French Corpus with Clinical Cases

    Get PDF
    International audienceTextual corpora are extremely important for various NLP applications as they provide information necessary for creating, setting and testing these applications and the corresponding tools. They are also crucial for designing reliable methods and reproducible results. Yet, in some areas, such as the medical area, due to confidentiality or to ethical reasons, it is complicated and even impossible to access textual data representative of those produced in these areas. We propose the CAS corpus built with clinical cases, such as they are reported in the published scientific literature in French. We describe this corpus, currently containing over 397,000 word occurrences, and the existing linguistic and semantic annotations

    Détection de la négation : corpus français et apprentissage supervisé

    Get PDF
    National audienceLa détection automatique de la négation fait souvent partie des pré-requis dans les systèmes d'extraction d'information, notamment dans le domaine biomédical. Cet article présente deux contributions liées à ce problème. Nous présentons d'une part un corpus constitué d'extraits des protocoles d'essais cli-niques en français, dédié aux critères d'inclusion de patients. Les marqueurs de négation et leurs portées y ont été annotés manuellement. Nous présentons d'autre part une approche neuronale supervisée pour extraire ces informations automatiquement. Cette approche est validée en l'appliquant à des données de l'état de l'art en anglais sur lesquelles elle montre de très bons résultats ; appli-quée sur nos données en français, elle obtient des performances comparables

    CAS: corpus of clinical cases in French

    Get PDF
    International audienceBackground: Textual corpora are extremely important for various NLP applications as they provide information necessary for creating, setting and testing those applications and the corresponding tools. They are also crucial for designing reliable methods and reproducible results. Yet, in some areas, such as the medical area, due to confidentiality or to ethical reasons, it is complicated or even impossible to access representative textual data. We propose the CAS corpus built with clinical cases, such as they are reported in the published scientific literature in French. Results: Currently, the corpus contains 4,900 clinical cases in French, totaling nearly 1.7M word occurrences. Some clinical cases are associated with discussions. A subset of the whole set of cases is enriched with morpho-syntactic (PoS-tagging, lemmatization) and semantic (the UMLS concepts, negation, uncertainty) annotations. The corpus is being continuously enriched with new clinical cases and annotations. The CAS corpus has been compared with similar clinical narratives. When computed on tokenized and lowercase words, the Jaccard index indicates that the similarity between clinical cases and narratives reaches up to 0.9727. Conclusion: We assume that the CAS corpus can be effectively exploited for the development and testing of NLP tools and methods. Besides, the corpus will be used in NLP challenges and distributed to the research community
    corecore